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CIA AUTOMATIC DATA PROCESSING STAFF 


Preface 


This outline report deals with the document/inf ormation retrieval 
system development element of Project ^■HHBHHHIHIHFbhinking25X1 A2g 
at the end of Phase I of the system development task. 


P 


25X1 A2g 


The report covers: 

(1) The results of fact-finding throughout 25X1 A2g 

the DD/l; 

(2) The conclusion that a major central reference 
system is required; 

(3) The initial concept of a new central system; 

(4) A suggestion to management that a base docu- 
ment indexing system be urged upon the 
intelligence community and that this indexing 
function be performed once and centrally for 
the members of the community; 

(5) Them** plan for proceeding with the detailed 
development of a new document/information 
retrieval system (through Phases II & III); 

(6) A set of general observations of particular 
interest to management; 

(7) Major alternatives open to management; and 

(8) AUPS recommendation. 


25X1 A2g Note : ^MThas produced several "depth" papers for its own purposes 
which elaborate on the contents of this outline report. These 
papers are available in ADPS to persons wishing to peruse them. 
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CIA AUTOMATIC DATA PROCESSING STAFF 


projt;ctJHBP 25X1 A2g 

DOCUMENT/INFORMATION RETRIEVA L SYSTEM DEVELOPMENT TASK 
PHASE I OUTLINE REPORT 


I. Document/information Retrieval System Development Task 

A . Four Phases of System Development Task : 

Phase I - Fact-Finding and Formulation of the Overall 
Concept of the New System 
(Sept 62 - June 63) 

Phase II - Detailed Systems Design 
(July 63 - June 64 ) 

Phase III - Implementation of Initial Segment 
(July 64 - April 65) 


Phase IV - Implementation of Additional Increments 
(May 65 - 1) 
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B. Phase I 

1. Fact Finding 
a. General 

Personnel Conducting the Survey: 


25X1 A5a1 


4 

4 


ASPS 



25X1 A2g 


Scope 

All Offices of the DD/l 
150 + components studied 
Fact-finding reports prepared on each 

25X1 A2g Major Targets of ^^^^B’act-Finding 

(1) Missions and functions of DD/l components 

(2) Information sources used 

(3) Internal processing and files (internal 
to Branch, etc. visited) 

(4) Use and evaluation of external files 

( 5 ) Reports produced 

(6) Information needs and problems 
Survey Completed April 1963 

b. Major Factors Bearing on System Development Task 

Volume of Document Receipts 

Multiplicity of DD/l Missions and Interests 

Variety and Depth of Info Required from these 
Documents 

Variable Time Requirements: 

For basic intelligence research 

For programmed, shorter- length research 

For current intelligence 
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Trend toward Current Reporting 

c. DD/l Information Resource b (Present System) Composed of: 

Analyst Files (para, d immediately below) 

Central Info System (OCR) (para, e) 

Dissemination Services (para, f) 

Other Internal and External Services (para, g.) 

d. Analyst Files 

The Analyst Files are, in fact, the primary DD/l 
info retrieval system in terms of: 

Use rate 

Response time 

Indexing and content to meet analyst 
specifications 

Uses 


To chech validity of new data and to determine 
its effect on what is already known. 

To handle immediate, short lead-time ad hoc 
queries. Basis for more leisurely research, 
also . 

Major Strengths 


Readily accessible 

Contain filtered data (reflects specialist/user 
judgment ) 

Tailored to analysts' needs (topic, sequence, 
and index control) 

Ability to control subjects (concepts) according 
to the specific requirements of the analyst 

Major Weaknesses 

Data control largely limited to current interests 
Not readily manipulated 
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Limited ana partial historical depth 

Not ideally accessible tc other analysts 

Organizations , personalities , areas not 
easily controlled 

Duplicative processing among DD/l components 
File maintenance detracts from analytic time 
e. Central System. (OCB) 

General Bole - Back-up to Analyst Files for: 
Historical depth 
Gaps in analyst file coverage 
Boutine, long lead time requests 
Major Uses 

To provide comprehensive recovery for long lead 
time, research projects 

To provide retrieval of data not controlled in 
analyst files 

To provide comprehensive storage and retrieval 
on organizations, personalities, areas 

Major Strengths 

Provides historical depth (institutional memory) 

Comprehensive topic and area coverage 

Multi-access to documents, e.g., date, source, 
topic, area, etc. 

Backstops intelligence gaps in analyst files 
Document repository 
Major Weaknesses 

No single point for all-source retrieval 
Outputs from multiple points not compatible 
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STATSPEC 


Insufficient emphasis given to open literature, 
and. cables 

Slow response 

Nou sensitive to shifts in intelligence sources 
and priority interests 

In&decuata geographic coordinate retrieval 
Duplicative processing 
f. Dissemination Services 


Manual system 

Minimum of 120 man years/year (rough estimate) 

One million unique documents/year 

10-15 million multiple copies/year 

150-200 components served with specific reading 
requirements 

General analyst satisfaction 


Timely and accurate 
Inefficient and costly 


25X1A5a1 


g. Other Information Retrieval Services 



Agriculture , etc . 


25X1 B4d 
FOIAb3b1 


Published bibliographies and indexes: Monthly Index 
of Russian Accession s, Referativnyy Zhurnal , ASTIA 
Technical Abstract Bulletin, etc . 


Files of other agencies: 
Dept, of Commerce, NSA, 


FTD/AFSC (White Stork), 


etc . 


FOIAb3b1 


Map L ibrary, 
j£&H rid/dip, etc. 


NPIC, 


rpb/ 


Analyst chatter 
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2 . Central vs Dc-C e ntralix c d_ Sy stem 

/ibis io a ma,]or d e cision area for both systems design 
and management . 

/A decision for a he-centralized system would mean the 
up-grading and coordination of the Analyst File complex with 
near- total dependence upor. same and the correlative curtail- 
ment of the central system to a very low use, very slow 
response, essentially archival role. 

/On the other hand., a- decision for the continuation of 
an up-graded central system, in addition to the Analyst File 
system, means that heavy expenditures for a central system 
will not only continue but undoubtedly increase, that the 
effort to devise an improved central system must continue, 
and that eventually the resultant advanced system must ' be 
implemented and the cost and commotion of doing so accepted^/ 

a. De -Centralized System (Analyst Files) 

Pros 


Provides primary support to intelligence production 

Proven in practice 

Reflects user needs and judgments 

For majority of uses, is preferred by analysts. 

(Will always exist to some degree . ) 

Integrated sources (within clearance-level of analyst) 


Cons 

"Personalized" files 
Difficult for others to use 
Lack continuity and consistency 
Difficult to manipulate 

Coverage of all orgs., persons, and areas, etc. not 
feasible 

Number and size would increase without central system 
- 6 - 
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b. Centralized System 

25X1 A2g concludes a central system is long-run ''must" 

Tor systematic doe /info control 

If improved, vould: 

Have higher use rate... thereby increasing the 
return on expenditures; and 

Make inroads into present Analyst Files... 
thereby helping to offset costs 

If accepted as a base index system fo^the Intelligence 
Community (see para. IB4 below), the system 25X1 A2g 

would undoubtedly pay for itself several times over. 
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25X1 A2g 3. System Concept 

a. Very simple to say: 

Central, integrated, machine -supported, system to 
provide document, and information retrieval for the 
total ED/l document flow. 

b. Characteristics 

What 

All source 

Ail geographic areas 

All topics (persons, places, things, organi- 
zations, subjects) 

Depth indexing 

Direct entry to files (input or querying) 
Single-processing of input 
Single -point retrieval 
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fcrmfcJ. indexing 
'Ian' ml dissemination 

\ oc random access capability 

initial machine translation/stenowriter 

ca p'. iiij xllty 

ilbcparimental remote inquiry or display 

Inner med iate . (1966-I967 ) 

Large hardware complex/some advanced hardware 

.-iinual indexing of hard copy 

Some automatic indexing of machine language 
sources 

Some character recognition (experimental) 

Limited remote interrogation and display 

Some automatic dissemination 

Volume machine translation 

Target Syste m (1968 - ?) 

Very large and advanced hardware complex, 
including extensive random access capability 

Automatic indexing for major portions of 
base recovery system (incl. character 
recoipiition) 

Human indexing for special info retrieval 
projects 

Remote interrogation and display 
Automatic dissemination 

Volume machine translation (improved quality) 

c . Elements 

(l) Document storage and retrieval 

(a) Persons, organizations/ inst allations , and geo- 
graphic locations to be stressed 7 
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?'tens of most universal interest to 
inalysts 

Weakest links in Analyst Files 

Strongest elements of present Central 
s oC.ni 

•/o Lvj.e beyond proper handling via Analyst 

Files 

0 >) Commo dities and Subjects to be covered with 
less emphasis 

lot priority need 

/bailee use in central system 

inalyst Files handle concepts (Subjects) 

setter 

(2) Information Storage, Manipulation, and Retrieval 

(a) Correlative to Document Index System via: 
index display 


Synthesis and summarization of index entries 

(b) Special Projects (Language Processing), such as: 


25X1 A2g 
25X1 B4d 


Strategic Facilities Project 


(Project 



(c) Major Automated Information System 


Subject: Targets 
Scope : World-wide 

Inputs : Mac hine l anguage files external 

to ■■ 25X1 A2g 

: I^HI^dex data (selected) 25X1 A2g 


: Special inputs designed for this 

system (For elaboration, see ^HB(1A2g 
paper, same subject, dated 2 May 63 ) 
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25X1 B4b 


(3) 


(d) Ccraputati os. (humor leal Processing), such as: 



Non-Literal tail Processing, such as: 


25X1 B5e 

(4) Machine Translacion/Stenowriter 

(5) Publication Support 

(Use of computer for composing, type setting, etc.) 
25X1 A2g d. i^HH troubled by size of task 

(1) Complexity of system design 

Balanced nandling of such variety and volume 

Accomplish objectives -without undesirable 
consequences 

(2) Hardware/ software limitations 

(3) Costs - personnel and budgetary 
e. Full solution will require: 

(1) Development of new techniques 

Index, dissemination, abstract, display, input/ 
output, etc. 

(2) Development of new hardware 

Memory, input/output, character readers, etc. 

(3) Money and people 

Major investments during developmental years. 
Savings in long run? 
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4 . Ideally, an Intellif ;ene $ _C o miaunity Task 

a. Ideal approach rears c.. ..-ar - task should be done centrally 
for the Intellli ;ence Co mmunity 

(1) Community Effclt 

(a) Eesigr and develop centrally a. base doc/info 
syscen. for use by community members 

(b) Index centrally all docs collected/ or. pirated 
by Intelligence Community 

Some decentralized. input but eanron_.ing 
t-o base system 

- ome spec ial-purpos e , ximited- intere at 
categories excepted 

(c) Provide base retrieval index, or suitable 
portions, to community members 

(d) Output servicing to be performed by individual 
members for its local users 

Ease system - provided by central organization 

Special files, as required - built and 
LC6 d by individual members 

Some output servicing provided by central 
organization 

(e) Initially: doc/info indexing and retrieval 

(f) Eventually: translation, requirements control, 

etc. 

(2) Executive Agent - CIA (or Intelligence Processing 
Center under USIB) 

CIA lias most suitable charter 

Cl. has most experience in large-scale, document 
systems 

CIA has best/largest personnel base 


25X1 A2g 


CM already started towards such a system via 
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CIA rust try vO do anyway i'or its own needs 

Oo-'io'ctunr^y for VIA management to take 
in : .tiaLiv'j on task of vital, interest to CIA, 

Scute, and ;hfi 

ur,:au. ox Budget should respond with real 
enthusiasm to such an idea (designing and 
c aveiouxng one system instead of multiple 
f.ystaus-; indexing centrally the intelli- 
gence document flow o nce instead of 
separately for each user). BOB could make 
fully ample funds available for this- task 
and still save the taxpayer major sums of 
money . 

b. Any such central effort for the community win! tak< - time, 
however. 

c. CIA has need to continue in the interim: 25X1 A2g 

(1) Nature of is consistent with community idea 25X1 A2g 

(2) need for resources of community magnitude25xi/\2g 
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5. General Plan for i rt-cct-tin ; with System Task ... 25X1 A2g { 
(July 1903 - April ..90/ land continuing") j 

a. Design a base dnc/.tfo system for total DD/l document j 

flow. . . (July 19t>3 - Tu .o 6-: and continuing) j 

b. Within above context, .-one .a-rently design for implementation j 

of the initial oegp-eat 03:’ tne total system. . . (July 63 - 

June 64-) 

Incrementation by source 1 

Will expand to eventual system 
Keeps design tuned to real world 

c. Fund and shape external 3&D of hardware and software if 
commercial dcveioptiient of same is not adequate ... (.4.963 - ^ ) 

Must have new capabilities to accommodate grc“-:. 
ox system 

Requirements -.-rill he clarified during system design 

d. Implement initial segment of new system. .. (July 6k - April 65 ) 

e. Expand coverage of new system. .. (May 65 - ?) 
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C. Phase II - Detail ed Syst em tcs-gu (July 63 - June 64) 

1. Personnel: 

a. ADPS - continuing iro-A Phase I 1 

25X1 A2g t, 4 HBIHcon tractor (li i) •• continuing frosi Phase I 

c. OCR - 

CHIVE has reqi istca middle -level tear., from OCR to I 

work lull uint- on Phase II. This team would: 

Receive training in EDP ! 

work directly with personnel on Phase 1*25X1 A2g 1 

bourn details system design 25X1 A2g 

lyovi.de vorKing -level OCR guidance 25X1 A2g 

Collect OCR iacts/statistics required oy Phase XI 

Serve liaison channels to OCR Divisiog§X1 A2g 

Ik come key CCR people for future operational 
implementation of system segments 

2. Sub-Tasks of Phase II: 

a. Information Processing - (CIA Team) 

Coverage/ scope 

Index techniques 

Record formats 

Data reduction requirements 

Query logic 

Output requirements — : 

Pile organizations 

Etc. 

b. Program Design - 25X1 A5a1 

Total program concept 
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File maintenance . „nd. query programs 
Utility prcgraa s;,-£.tem 


i'Jcui'cm; 


.-.., 10-13 


c. Hardware Study" and < icmmendat ions 



j ‘ t pu u J o a pn t 

Micro -image store 
Memory 


File conversion 


liachin e-reaanble 
Non -machine -readable 
Reserve files 

d. Design Data Collection - (CIA Team.) 
Data flow 


Processing rates 
User attitudes/needs 
Present system statistics 

e. Training - (CIA Team) 

MM staff (incl. personnel detailed from OCR) 
User personnel 

System orientation 
Task training 

f . Implementation Planning - (CIA Team) 

g. Management/Coordinatxon - (JOINT Team) 



25X1 A5a1 


25X1 A2g 
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D. Phase III - Initi al Imp ie-m .* ta tion (July 64 - April 65 ) 

1. Sub-Tasks 

a. Write programs 

b. Install sent; ad lie coal hardware 

c. Commence operat Lon o' system 

Input proc issir r 
File mainterunr.ee 
Output services 

d. Training 

2. Implement by Source Inci diJH OS 

Initial sources: tj and T/KH 
Wh; y SI and T/KH? 

How handled by single organizational component of 
OCR (Special Register). Thus: 

Easier to study 

Organizational dislocation within OCR resulting 
from ample dentation is minimized 

Present. SR system most similar to^^^|concept 25X1 A2g 

Present SR responsibilities for document reference 
service approximate microcosm of OCR 

SR personnel most familiar with machine procedures 

Both sources of significant intelligence worth 

E. Phase IV - Expansion of Initial Increment (May 65 - ?) 

1. Addition of new sources (e.g., CS reports, 00-B’s, S & T 
literature) 

2. System configuration will change with experience 
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II. General Observations 
25X1 A2g A. Concept 

Prematurely defined, an tnis point in time 
Tentative - will c.h; rtgt 

Propox’tioiib of ChlV. ■; obgecuives are beyond any present-day 
system and beyond presen c- day hardware 

Success is far from guaranteed 

Even if many of the advanced elements of the concept 
did not materialize . uo van 'cages will accrue 

25X1 A2g B. Why a System? 

Control of more material 

Deeper/' more flexible index 

Rapid document retrieval 

Intensive information retrieval 

Single service point for document retrieval 

Single input processing 

Integrated output 

Postures the central system to grow with EDP (where future 
machine - supporc capabilities lie) 

Eventual automation of some functions now done manually 
C. Functions of OCR Affected/lot Affected byj^^HI System 
1. Affected: 

Indexing and retrieval 
Machine support 
Dissemination 

Document storage and retrieval 
Photo storage and retrieval 


25X1 A2g 


25X1 A2g 
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Extracting/aostractung services j 

Publications procurement accounting and control | 

2. Not Affected: I 

Book cataloging anc -shelving I 

Publications anc T-ho -o , rocuresnent 1 

Library reference a.*.* circulation services (non -document) j 

Distribution services, i.e., the mailroom functions 
Motion picture presentations ■ 

Liaison Staff j 

Historical Intelligence Collection i 

D. Organizational Effebts on CCH 

Interim - New system will slowly absorb people and functions 

- All constitute new element; traditional elements 
continue 

Eventual - Present OCR Divisions will largely disappear 

- Input Divisions within^^^H will be xised by25X1 A2g j 

Geographic Region j 

- Service Division t 

- Systems ■ development Division 

- Programming Division I 

- Computer Operations Division 

25X1 A2g - New non|^^| Division(s) for non^^^^Jfunctions25X1 A2g 

E. Schedule of Effects on OCR 

Phase I - Fact-Finding and Systems Concept. .. (Sept 62 - June 63 ) j 
Effect: Rone 1 

Phase II - Detailed Systems Design. .. (July 63 - June 64) 

Effect: Rone, except OCR System Trainees join 

with 25X1 A2g 
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Phase III - Initial implementation. .. (July 6 k - April 6 1 ?) 

Effect: 1; index, reference, and. punch 

■ vconnel phase over to new system 

• . . .soon of old to new files 
cx -Suva (limited j 

Z . v ,VO- /oOCX ‘C'XX* uj-CXJh. IS OX old S ^ £ t O m 

...•ip iced hy ELAM 

Phase IV - Extension of feyptem. . . (day 6p - '0 

Effect: JLle maintenance/index/ reference 
.-sonnel from IR/BR/gr/ DD/L y 
(mrtellofax) phase into new system 

: imc n personnel in MD phase over 

: l Lie conversion accomplishes (limited) 

: la converted portions of old system 

go continue operations 


25X1 A2g 
25X1 A2g 


i 



F. Single Service Point Idea 

Implementation of initial segment of adds one mor e-- 25X1 A2g 

unless OCR develops now a single service point to tap for the 
consumer all pertinent OCR resources. 


G. 


Organization of OCR hy Geograpnic Region Prior 



to Implementation 


Organization of OCR hy legion before legal implementation 25X1 A2g 

would f oster d evelopment of single OCR service point, would 
lead toHH increments oy Re gion as weli^^source, and 
would facilitate successive expansions of 25X1 A2g 


H. State -of -the -Art Implications 

Conventional human indexing pushed to limit 

EAM support pushed to limit 

EDP offers hope through new capabilities 

Even with TUP, R&D in hardware and s oftware a "must" to 

expand capabrl.-iies to meet expanded^^^^B requirements in 25X1 A2g 

Phase IV. 


- 20 - 
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Machine indexing inferior to human indexing today 

But, offers sosec , cqucxstency , and eventually perhaps 
comparable quality 

Total document retrieval foystem for Db/I appears not feasible 
with today’s equipment 

Eventual w/l system wii l b? based on next 3-5 years of ^^>25X1 A2g 
implementation experience and on RED m industry 

I. Budgetary Implications 

Development and implementation costs will be heavy 

Hardware Development : (Government RID support may be 
required) 

Systems /Techniques Development (Government support almost 
certainly required) ■ 

Parallel Systems One rat ion 

Conversion 

Eventual system more economical per item of data controlled 

J. Manpower Implications 

By single input handling; of documents, hope to gain manpower 
to peix.it : 

Deeper indexing 

Broader coverage 

Greater effort on output 

K. Conversion Implice.tions 

It is desirable to convert present OCR machine files, if 
feasible. EAM data may not be compatible with EDP files, 
however 

--A stucy question for Phase II 

L. Security Implications 

"All-Source" clearance for all personnel operating the CHIVE 
system 
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All-Source data rile >: 


Ho pi tya i oe,l c on a --v 

vc ter. . ui ..1 

nowever) 


c atiit ion 

i-£Yc 


25X1 A2g 


security classification code 



